Norwalk
Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
Yuan, Jiaqing, Pan, Lin, Hang, Chung-Wei, Guo, Jiang, Jiang, Jiarong, Min, Bonan, Ng, Patrick, Wang, Zhiguo
Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (8 more...)
Tesla on autopilot smacked into Florida Highway Patrol cruiser that stopped to help disabled vehicle
A Tesla Model 3 driving on'autopilot' smacked into a Florida Highway Patrol cruiser on Saturday morning, narrowly missing the driver of the cruiser who had stopped in order to help a disabled vehicle. The incident is the 12th such smash involving a Tesla on autopilot mode and an emergency vehicle. All the cars which have been struck had their lights flashing, or had deployed an emergency flare, illuminated warning sign or cones, raising questions about whether they may have confused the Tesla's sensors. Saturday's smash happened after when the 28-year-old trooper, who has not been named, stopped shortly after 5 am on August 28 on I-4 near downtown Orlando while responding to a broken down car. He put his emergency lights and was walking over to a disabled vehicle when the Tesla hit the cruiser's left side, according to a copy of the police report seen by DailyMail.com.
- North America > United States > Texas > Montgomery County (0.15)
- North America > United States > North Carolina > Mecklenburg County > Charlotte (0.15)
- North America > United States > Connecticut > Fairfield County > Norwalk (0.15)
- (12 more...)
- Transportation > Ground > Road (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Automobiles & Trucks > Manufacturer (1.00)
U.S. Opens Investigation Into Tesla's Autopilot Driving System
The U.S. government has opened a formal investigation into Tesla's Autopilot partially automated driving system after a series of collisions with parked emergency vehicles. The investigation covers 765,000 vehicles, almost everything that Tesla has sold in the U.S. since the start of the 2014 model year. Of the crashes identified by the National Highway Traffic Safety Administration as part of the probe, 17 people were injured and one was killed. NHTSA says it has identified 11 crashes since 2018 in which Teslas on Autopilot or Traffic Aware Cruise Control have hit vehicles at scenes where first responders have used flashing lights, flares, an illuminated arrow board or cones warning of hazards. The agency announced the action Monday in a posting on its website.
- North America > United States > Texas > Montgomery County (0.05)
- North America > United States > North Carolina > Mecklenburg County > Charlotte (0.05)
- North America > United States > Michigan > Ingham County > Lansing (0.05)
- (11 more...)
- Transportation > Ground > Road (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Tesla's Autopilot faces US investigation after crashes with emergency vehicles
The US government has opened a formal investigation into Tesla's Autopilot partially automated driving system after a series of collisions with parked emergency vehicles. The investigation covers 765,000 vehicles, almost everything that Tesla has sold in the US since the start of the 2014 model year. Of the crashes identified by the National Highway Traffic Safety Administration (NHTSA) as part of the investigation, 17 people were injured and one was killed. NHTSA says it has identified 11 crashes since 2018 in which Teslas on Autopilot or Traffic Aware Cruise Control have hit vehicles at scenes where first responders used flashing lights, flares, an illuminated arrow board or cones warning of hazards. The agency announced the action on Monday in a posting on its website.
- North America > United States > Texas > Montgomery County (0.05)
- North America > United States > North Carolina > Mecklenburg County > Charlotte (0.05)
- North America > United States > Michigan > Ingham County > Lansing (0.05)
- (10 more...)
- Transportation > Ground > Road (1.00)
- Government > Regional Government > North America Government > United States Government (0.94)
Real-life RoboCop was at the scene of a crime. Then it moved on.
When a fight broke out recently in the parking lot of Salt Lake Park, a few miles south of downtown Los Angeles, Cogo Guebara did what seemed the most practical thing at the time: she ran over to the park's police robot to push its emergency alert button. "I was pushing the button but it said, 'step out of the way,'" Guebara said. "It just kept ringing and ringing, and I kept pushing and pushing." She thought maybe the robot, which stands about 5 feet tall and has "POLICE" emblazoned on its egg-shaped body, wanted a visual of her face, so she crouched down for the camera. Without a response, Rudy Espericuta, who was with Guebara and her children at the time, dialed 911.
- North America > United States > California > Los Angeles County > Los Angeles (0.25)
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > California > Los Angeles County > Norwalk (0.05)
- North America > United States > California > Los Angeles County > Huntington Park (0.05)